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Abstract 

In languages such as Japanese, the use of zeros, unexpressed argu- 
ments of the verb, in utterances that shift the topic involves a risk that the 
meaning intended by the speaker may not be transparent to the hearer. 
However, this potentially undesirable conversational strategy often occurs 
in the course of naturally-occurring discourse. In this chapter, I report on 
an empirical study of 250 utterances with zeros in 20 Japanese newspa- 
per articles. Each utterance is analyzed in terms of centering transitions 
and the form in which centers are realized by referring expressions. I also 
examine lexical subcategorization information, and tense and aspect in 
order to test the hypothesis that the speaker expects the hearer to use 
this information in determining global discourse structure. I explain the 
occurrence of zeros in RETAIN and ROUGH-SHIFT centering transitions, 
by claiming that a zero can only be used in these cases when the shift of 
centers is supported by contextual information such as lexical semantics, 
tense and aspect, and agreement features. I then propose an algorithm by 
which centering can incorporate these observations to integrate centering 
with global discourse structure, and thus enhance its ability for non-local 
pronoun resolution. 



1 Introduction 



Centering Theory is a computational model of discourse interpretation that examines the 
relationship between attentional state, the form of referring expressions, and the control of 
inferential processes. These goals have led to its application to the study of unexpressed 
arguments (henceforth zeros) in topic-oriented languages like Japanese, in which salient 
entities, recoverable by inference in a given context, are freely omitted. Centering predicts 
the preferred interpretation of zeros in situations in which the antecedent of a zero was 
realized as a center in the previous discourse. 

Previous work argues that both syntactic and discourse factors associated with potential 



antecedents determine the preferred interpretation of zeros [Kuno, 1972, Kameyama, 1985 
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Walker et ai, 1990, Iida, 1992, Walker et ai, 1994 . For example, a discourse entity realized 



as a subject is more likely to serve as the antecedent of a zero than a discourse entity realized 
as a object. Walker et al. incorporated certain discourse features into centering with their 
proposed rule of ZERO topic assignment (henceforth ZTA). This proposal was motivated 
by the observation that a zero that was previously the center of attention (i.e., Cb) is easily 
understood as the continuing center even if it is expressed in a syntactically less salient 
argument position. For example, a zero object in a given utterance such as lc below, is the 
topic because it was the Cb in the previous utterance. As the topic, the discourse entity 
realized in object position is ranked higher on the Cf list than the discourse entity realized 
as the subject. This explains the preferred interpretation of the subsequent utterance, (l)d 
in this will be discussed in more detail below. 



(1) a. Hanako wa siken o oete, kyoositu ni modorimasita. 

Hanako top/subj exam OBJ finish classroom to returned 
Hanako returned to the classroom, having finished her exam. 

b. hon o locker ni simaimasita. 
SUBJ book OBJ locker in took-away 
She put her books in the locker. 

c. Itumo no yooni Mitiko ga deki o tazunemasita. 
always like Mitiko subj obj2 result obj asked 
Mitiko, as usual, asked (Hanako) how she did. 

d. zibun no tokenakkatta mondai o misemasita. 
subj OBJ2 self GEN solve-could-not problem OBJ showed 
(Hanako) showed (Mitiko) the problems which she could not solve. 



In order to further test the feasibility of ZTA and to examine strategies for keeping track 
of centers, this chapter examines the distribution of zeros in naturally occurring Japanese 
newspaper texts. Two initial hypotheses about the use of zeros are given in 2 and 3: 



(2) Hypothesis-1 

Zeros are used to continue the center. 



(3) Hypothesis-2 

Full nps are used to shift the center. 



These hypothesis are similar to one tested for Italian in (Di Eugenio, this volume) .g I report 
on an empirical study based on 250 utterances from a corpus of 20 Japanese newspaper ar- 
ticles. Each utterance is analyzed in terms of centering transitions and the form in which 
centers are realized by referring expressions. I also examine lexical subcategorization in- 
formation, and tense and aspect in order to test the hypothesis that the speaker expects 
the hearer to use this information in determining global discourse structure. Figure [l] sum- 
marizes the findings on the distribution of centering transitions with respect to form of 
referring expression used in the utterance. 

The hypothesis in (2) is confirmed by the distribution of continue transitions in Figure 
[j], as compared to the other transitions combined % 2 = 53.932, p < .001). In CONTINUE 

x The salience of the subject has also been observed in various syntactic phenom- 
ena such as extraction and binding. See Walker et al. (1994) for the discussion of 
salience among the arguments of the verb in Japanese. 

2 Di Eugenio's hypothesis says, "Typically a null subject signals a continue, 
and a strong pronoun a RETAIN or a SHIFT. 
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CONTINUE 


RETAIN 


SMOOTH-SHIFT 


ROUGH-SHIFT 


with ZERO 


76 


3 


34 


23 


without zero 


7 


39 


9 


35 


total 


83 


42 


42 


58 



Figure 1 : Distribution of Centering Transitions and Zeros in Japanese newspaper texts 

transitions, zeros are strongly preferred to NPs: among continue transitions, 76 cases 
appears with zero and only 7 cases without zero, while other transitions preferentially 
realize centers by nps: there are 60 cases with zeros and 83 cases without zeros. 



Note that Hypothesis- 1 predicts as a corollary that discourse entities ranked higher in the Cf 
ranking would tend to be realized by zeros. The preference of zeros in continue transitions 
proves this tendency and provides additional support for Walker etal's rule of ZERO TOPIC 

ASSIGNMENT. 



However, the second hypothesis in 3 is disconfirmed: while the frequency of full NPs is 
greater (83) than zeros (60), full nps are not always used to shift the center, and zeros fre- 
quently are. The distribution of centering transitions in figure [j] shows that a shift of atten- 
tional state is abundant in naturally occurring discourse, as seen by the frequency of retain 



and rough-shift, which the centering algorithm prefers the least prcnnan et a/., 1987], In 



the Japanese data examined here, these transition states are identified when a zero cannot 
take the current center of attention, the Cb, as its antecedent. 

Thus, what needs to be explained is the occurrence of zeros in these transitions in which the 
Cb changes, where it may be difficult for the hearer to determine which discourse entity is 
realized by the zero. How is discourse coherence preserved when two adjacent utterances are 
not locally coherent? In the transition state of retain, the rule of zta makes it is possible 
to avoid shifting the Cb, but in a rough-shift transition, there is no link to the prior 
utterance, and the Cb must shift.^] 

The main focus is of this chapter is to study the relation of local and global structure in 
discourse, by exploring the strategies that a speaker uses to reduce the hearer's inference 
load and make the flow of discourse coherent when the antecedent of a zero is not realized 
in the immediately preceding utterance. In section ||, I discuss in more detail how centering 



works in Japanese and the rule of ZERO topic assignment [Walker et al, 1994[ . Then in 
section |[ I show how cues such as lexical semantics and tense and aspect can be used to 
interpret zeros in utterances that realize ROUGH shift transitions. On the basis of this 
analysis, section || sketches an algorithm for integrating centering with global focus, and 
finally section summarizes the contributions of the chapter. 



3 What I call a ROUGH shift in this chapter is elsewhere called a NO CB transition. 
That is, there is no Cb as no entity from the Cf(U n _i) is realized in the current 
utterance. 
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2 Zero Topic Assignment and Disambiguation 



In this section, I briefly describe the ZTA rule proposed by Walker et al. and show that 
discourse coherence indeed tends to be maintained with the same discourse topic across 
utterances. In Walker ct al., the centering algorithm specifies two structures for centers, 
namely Cb (backward-looking center) and Cf (forward-looking centers), and a 
set of rules and constraints (See Walker, Joshi and Prince, this volume), forward-looking 
centers are a set of semantic discourse entities associated with each utterance. The Cf 
Ranking for Japanese according to discourse salience is given in (4). 

(4) TOPIC > EMPATHY > SUBJECT > OBJECT2 > OBJECT > OTHERS 

The highest ranked member of the Cf list is called the Cp (preferred center) . The Cp 
represents a prediction about the Cb of the following utterance. The backward-looking 
center is the discourse entity that the utterance most centrally concerns. Discourse co- 
herence is computed with this distinction between looking back to the previous discourse 
with the Cb and projecting preferences for interpretation in subsequence discourse with the 
Cp. In other words, the combination of the Cb and the Cp reflects the coherence of the 
discourse. The shift of centers is realized when a new entity is introduced as the Cp. 

These interactions of the Cb and Cp are stated as a set of constraints and rules (Walker, 
Joshi and Prince, this volume). What the constraints and rules amount to is the idea 
that discourse segments that continue centering the same entity are more coherent and 
easier to process than those that repeatededly shift from one center to another. The theory 
measures coherence by the hearer's inference load when interpreting a discourse sequence 
[ |Grosz et al, 198(\ , |Grosz et al, 199~ij . 



zero TOPIC assignment is a discourse rule which allows a zero to be interpreted as a zero 
TOPIC, zta is applied when there is no continue transition of the previous center. 

(5) Zero Topic Assignment 

When a zero in Ui+i represents an entity that was the Cb(Ui), and when no other 
continue transition is available, that zero may be interpreted as the ZERO topic 
of U i+ i. 

The rule allows a zero that has been the Cb in Uj_i to continue as the Cp in Ui, even if 
it appears in a less salient syntactic position. It explains why the discourse entity Hanako, 
which is realized as the OBJECT2 in (6)c is interpreted as the SUBJECT in (6)d. Consider 
again example (1) repeated here as (6), with the centering data structures: 

(6) a. Hanako wa siken o oete, kyoositu ni modorimasita. 

Hanako top/subj exam OBJ finish classroom to returned 
Hanako returned to the classroom, having finished her exam. 



Cb: hanako 

Cf: [hanako, examI 



hon o locker ni simaimasita. 
SUBJ book OBJ locker in took-away 
She put her books in the locker. 



Cb: 


HANAKO 




Cf: 


[hanako, book, locker] 


CONTINUE 
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c. Itumo no yooni Mitiko ga deki o tazuncmasita. 
always like Mitiko SUBJ OBJ2 result OBJ asked 



Mitiko, as usual, asked (Hanako) how she did. 



Cb: 


HANAKO 






Cfl: 


[hanako, MITIKO, 


result] 


ZTA CONTINUE 




TOP, SUBJ, OBJ 






Cf2: 


[mitiko, HANAKO, 


result] 


RETAIN 




SUBJ, OBJ2, OBJ 







d. zibun no tokcnakkatta mondai o miscmasita. 
SUBJ OBJ2 self gen solve-could-not problem OBJ showed 
(Hanako) showed (Mitiko) the problems which she could not solve. 
(Mitiko) showed (Hanako) the problems which she could not solve. 



Cbl: 


HANAKO 




Cfl: 


[hanako, mitiko, problem] 


CONTINUE from Cfl(c) 




SUBJ, OBJ2, OBJ 




Cb2: 


MITIKO 




Cf2: 


[mitiko, hanako, problem] 


SMOOTH-SHIFT from Cf2(c) 




SUBJ, OBJ2, OBJ 





The discourse situation in (6) is a case where the hearer may maintain multiple hypotheses 
about where the speaker's attention is directed. There are two assumptions available, the 
assumption that ZTA applies and the zero is interpreted as the topic, versus the assumption 
that subjects are more highly ranked than objects on the Cf. Cf2 of (6)c is the only Cf 
possible without ZTA, and represents a retain rather than a continue. By the formulation 
of the ZTA rule above, ZTA is triggered here since no continue transition is otherwise 
available. Cfl represents a CONTINUE reading due to the ZTA option; HANAKO can be the 
Cp even when mitiko is realized as the subject. This could lead to a potential ambiguity in 
(6)d, because it is possible for a hearer to simultaneously entertain both of the Cfs in (6)c. 
However, the CONTINUE interpretation which results from the ZTA continue transition 
state is strongly preferred. Walker et al (1994) reported that 28 out of 34 speakers preferred 
the continue interpretation in (6d); (Z = 4.95, p < .001). The less preferred SMOOTH- 
SHIFT interpretation would come from the algorithm's application to Cf2 of (6)c. 

Walker et al. make a distinction between the notions of GRAMMATICAL topic and zero 
topic. The grammatical topic is the wa-markcd entity, which is by default predicted to be 
the most salient entity. The interaction between the grammatical topic and the zero topic 
is observed in (7). Discourse segment (7) uses the wa-marked NP instead of the GA-marked 
np in the ZTA environment of (7)c. Compare the interpretation of (7)d with (6)d. 

(7) a. Hanako wa siken o oete, kyoositu ni modorimasita. 

Hanako top/subj exam OBJ finish classroom to returned 
Hanako returned to the classroom, having finished her exam. 

Cb: HANAKO 

Cf: [hanako, exam] 

b. hon o locker ni simaimasita. 
SUBJ book OBJ locker in took-away 
(Hanako) put (her) books in the locker. 

Cb: HANAKO 

Cf: [hanako, book] continue 

c. Itumo no yooni Mitiko wa deki o tazunemasita. 
always like Mitiko top/subj OBJ2 result OBJ asked 
Mitiko, as usual, asked (Hanako) how she did. 
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Cb: 


HANAKO 


Cfl: 


[hanako, mitiko, result] zta CONTINUE 




ZERO-TOP, TOP/SUBJ, OBJ 


Cf2: 


[mitiko, hanako, result] RETAIN 




TOP/SUBJ, OBJ2, OBJ 



d. zibun no tokenakkatta mondai o misemasita. 
SUBJ OBJ2 self GEN solve-could-not problem OBJ showed 
(Hanako) showed (Mitiko) the problems which she could not solve. 
(Mitiko) showed (Hanako) the problems which she could not solve. 



Cbl: 


HANAKO 




Cfl: 


[hanako, MITIKO, 


problem] continue from Cfl(c) 




SUBJ, OBJ2, OBJ 




Cb2: 


MITIKO 




Cf2: 


[MITIKO, HANAKO, 


PROBLEM] SMOOTH-SHIFT from Cf2(c) 




SUBJ, OBJ2, OBJ 





The wa marking has the predicted effect. Using the grammatical topic marker wa in (7)c 
dampens ZTA and thus affects the interpretation of (7)d, which is now completely ambiguous. 
The results of experiments reported in [Walker et ai, 1994 1 show that 10 subjects who prefer 
an interpretation that depends on ZTA in (6) can no longer get the interpretation in (7). 
In (7)d, only 18 out of 34 subjects prefer the ZTA continue interpretation. Because the 
discourse entity realized as the grammatical topic and indicated by the wa-marked np is 
the Cp by default, it is harder to interpret the zero as the topic. The situation can be 
characterized as a case of competing defaults; some hearers apply the default that the wa- 
marked entity is usually the Cp, and others apply the default that continue interpretations 
are preferred and that zeros realize discourse entities that are ranked highly on the Cf. 



When an ambiguity arises from the use of the WA-marked np in the ZTA environments as 
illustrated in the above example, it is often resolved with additional information provided 
in the subsequent discourse. Consider (8).Q 

(8) a. S International wa sirikon-varee ni kenkyuusyo o kaisetusuru. 

S International top/subj silicon valley in laboratory OBJ establish 
(S International) establishes a laboratory in Silicon Valley. 

b. sutaffu tosite doobunya no keni hutari o sukautosita. 
SUBJ staff as this-field GEN authority 2-people OBJ recruited 
(S International) has recruited two authorities in the field as a staff. 

Cb: S INTERNATIONAL 

Cf: [S INTERNATIONAL, TWO AUTHORITIES] 

c. Kono kenkyuusyo wa saniibeeru ni kaisetusi, 
this laboratory top/obj SUBJ Sunnyvale in open 

(S International) will open this laboratory in Sunnyvale, 

4 There is no decisive proposal how complex sentences should be divided and 
arranged. In this study, I simply divide a complex sentence into simplex sentences 
and arranged them in serial order. The complex sentences which appeared in the 
data consist of coordinations and compounds with temporal adjunct clauses. A 
temporal subordinate clause is followed by the main clause in Japanese, so simple 
serial ordering normally preserves their chronological order. 



G 



Cb: 


S INTERNATIONAL 




Cfl: 


[S INTERNATIONAL, LABORATORY] 


ZTA CONTINUE 




ZERO-TOP, TOP/OBJ 




Cf2: 


[laboratory, s international] 


RETAIN 




TOP/OBJ, SUBJ 





d. "Oputo-huirumu-kcnkyuusyo" to nazukcru. 
SUBJ OBJ Opt-film-laboratory as name 

(S International) names (the laboratory) Opt- film Laboratory. 

Cb: S INTERNATIONAL 

Cf: [S INTERNATIONAL, LABORATORY] 



Recall that the ZTA effects are dampened when the grammatical topic marker wa is used. 
The third sentence yields the situation where the zero topic must compete with the gram- 
matical topic, and the preference for one over the other is hard to determine. The ambiguity 
is resolved after processing the fourth sentence, however, when semantic information about 
the naming relation is provided. In other words, the inference that a newly created thing is 
normally given a name, allows the hearer to hypothesize that the laboratory naturally fills 
the named slot of the naming relation. 

In sum, these observations support the predictions made by centering that the preferred 
interpretation of utterances that contain zeros is one in which discourse coherence is main- 
tained. Furthermore, ZTA allows the hearer to interpret the current utterance as being highly 
coherent with the previous utterance. I have also suggested that in cases where an ambigu- 
ity arises because of the use of ZTA, the speaker will provide additional cues to guide the 
hearer's interpretive process. 



3 The Shift of Attentional Focus 



Now let us consider the prediction that discourse coherence is maintained even when zeros 
are used to shift the center. This is the context in which the Cb in utterance U; is not 
realized as the Cp (i.e. the most salient entity in Ui). A new entity is introduced as the Cp, 
and the shift of the speaker's attentional focus onto this new entity is indicated. Below, I 
examine the interpretation of zeros in retain (discourses (9) and (10)) and rough-shift 
(Discourses (11) and (12)) transitions. After discussing these examples, I propose some 
hypotheses about how zeros are interpreted in these environments. 

In (9c) a new center, T co. is introduced into the discourse and realized as a topic, while 
the old center, the student is realized as an object. Thus the center realized by the student 
is ranked lower on the Cf than the center realized by T co., but the student is still the Cb, 
so the centering transition is a retain. 

(9) a. Gakusei wa hurii-daiaru-kaado de G-sya e denwasureba, 

students top/subj free-dial card with G. Company to phone 
When students call G. Company with the phone card, 

b. syuusyoku-zyoohoo o muryoo-de erareru. 
SUBJ employment-information OBJ free get-can 
(The student) can get employment information free. 

c. T-sya wa rezyaa-zyoohoo o fakusimiri de teikyuusitcori, 
T Co. top/subj leisure info. OBJ OBJ2 fax by provide 

T Co. provides leisure information (to student) by fax, 
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In (10c) a new center, the price is introduced into the discourse and realized as a topic, 
and the center for the bank is realized as a subject. Thus the center for the bank is ranked 
lower on the Cf than the center for the price., but the bank is still the Cb, so the centering 
transition is a retain. 

(10) a. Saga Ginkoo wa gasorin-sutando de "banku POS" saabisu o hazimeru. 

Saga Bank top/subj gas station at "Bank POS" service OBJ will start 
Saga Bank will start "Bank POS" service at gas stations. 

b. kaimono-kayku ni kyassyu-kaado wo tukatte-morai, 
SUBJ shoppers OBJ2 cash card OBJ use-ask 

(the bank) asks shoppers to use a credit card, 

c. daikin wa sokuza-ni kokyaku no kooza kara hikiotosu. 
price TOP / OBJ SUBJ immediately customer POSS account from draw 
(the bank) takes the charge immediately from a customer's account. 

In (11c), the only center that provides a link to the prior discourse is the center for the 
customer, so that center is the Cb. However the customer is is ranked lower on the Cf than 
the center for T. Insurance Co., yielding a ROUGH SHIFT centering transition. 

(11) a. S. ginkoo wa kinyuu-hosyoo-seido no toriatukai mo hazimeru. 

S. Bank, top/subj money-insurance-system gen handling OBJ begin 
S. Bank will start to handle a money insurance system as well. 

b. Kokyaku ga ittei ryookin o haraeba, 
customer SUBJ certain fee OBJ pay 

A customer pays a certain amount of fee, 

c. T. Insurance Co. ga sono kinyuu-torihiki o hosyoosuru. 
T. Insurance Co. SUBJ that money-transaction OBJ OBJ2 insure 

T Insurance Co. insures the money transaction (to the customer). 

In (12a), the phrase T. Electron introduces a center that is established as the Cb in (12b. 
Other discourse entities become the Cb in utterances (12d)) to (12)f, but in (12g) the center 
corresponding to T. Electron is realized by a zero. None of the centers in (12)f serve as an 
antecedent for this zero, so this is a ROUGH shift transition. 

(12) a. T. Electron wa Yamanasi-ken Nirasaki-si ni daikibona 

T. Electron top/subj Yamanasi, Nirasaku-city in big 
koozyoo o kaisetusuru. 
facotry OBJ will built. 

T.E lectron will open the big factory in Nirasaki City, Yamanasi 

b. (a few sentences about T. Electron) 

c. Sinkoozyoo de seisansuru sooti wa TE5000 o 
new factory in produce-is devices top / SUBJ TE5000 OB j 
seinoo-appu-sita RiE-ettingu-sooti. 

power-up-did RiE-etching-devices 

The devices that produced in the new factory are RIE etching devices, 
more powerful than TE5000. 

d. 16mdram no sesan ni taioodekiru. 
SUBJ 16MDRAM gen production OBJ2 cope-with 

(rie devices) can cope with the production of 16 MDRAM. 

e. dram no syuusekido ga takamaruniture, 
dram gen integrality SUBJ increase 

As the integrality o/dram increases, 
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f. ettyaa no zyuyoo ga hueru tame, 
etching-devices GEN demand SUBJ increase since 
The demand of etching devices increases, and hence, 

g. sinkoozyoo no seisan ni humikitta. 
SUBJ new facility GEN production OBJ2 decided 

(T. Electron) decided to begin the production in the new facility. 



Note that the interpretation of zeros is not particularly problematic in the case of retain; 
although the Cb is shifting the antecedent for the zero is a center from the previous utter- 
ance. Furthermore, in some cases, the retain transitions may have a ZTA continue option. 
However in the ROUGH-SHIFT transition, no local antecedent of a zero is available and a 
center shift is forced. In this second case, the zero's antecedent is not in the immediately 
preceding utterance, but must be realized in prior utterances. These cases have been called 



return POPS or FOCUS POPS in the literature Reichman, 1985, Polanyi and Scha, 1984, 
Grosz and Sidner, 1986 1 . See also (Walker, this volume). 





LEXICAL 
SEMANTICS 


TENSE & 
ASPECT 


AGREEMENT 


ROUGH-SHIFT with zeros 


20 


6 


2 



Figure 2: Disambiguation Features for Rough-Shift 

If discourse coherence is to be maintained, it seems clear that there must be other cues that 
are used to preserve coherence and resolve zeros appropriately. This prediction has turned 
out to be correct. To test the hypothesis that shifting centers are associated with contextual 
factors that facilitate transitions, such as lexical semantics, agreement information and tense 
and aspect, all the rough shifts in the corpus (23 of them) wre coded for these features. 
The results are given in figure ||.[] Below I illustrate the role of these factors in interpreting 
zeros when the center shifts with representative examples from the corpus. 



3.1 Interaction with lexical semantics 



Let us take a look at the discourse in (12). The appropriate interpretation of the zero in the 
last sentence is constrained by the semantic restriction assigned to the arguments of verb 
'decide'. No entity in (12)f can be a potential antecedent, and the zero must be resolved to 
a discourse entity expressed in the previous utterances of the text. In this case, it goes back 
to the utterance where T. Electron is available^] 

(13) T. Electron will open the biggest factory in Nirasaki City, Yamanasi. (T. Elec- 
tron) will build (the factory) in the company property adjacent to its General 
Laboratory. (T. Electron) will provide a big-scale clean room, and produce etching 
devices which can deal with 16M bit dynamic RAM. The total investment amounts 
to 5 billion yen and the construction starts this fall. It is expected that (the fac- 
tory) will start operation in a year later. The devices produced in the new factory 

5 The total number of the table exceeds the total number of 23 occurrences of 
the rough-shift transition with zeros. This is due to the fact that there are some 
cases where two features (i.e. lexical semantics and tense) are employed at the same 
time. 

6 The part indicated by italics is the segment given in (12). 
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are RIE devices, more powerful than TE5000. (RIE devices) can cope with the pro- 
duction of 16MBRAM. As the integrality o/DRAM increases, the demand of etching 
devices increases, and hence, (T. Electron) decided to begin the production in the 
new facility. 

If we assume that the antecedent of a zero is any of the centers introduced in the previous 
discourse, the interpretation of the last sentence would be ambiguous; there are multiple 
potential candidates even if lexical information is brought to bear. Note that Nirasaki City 
and General Laboratory are semantically legitimate antecedents of the missing subject of the 
deciding-situation described by the last sentence. The uncontroversial interpretation with 
T. Electron as the antecedent suggests that a discourse entity that has not been previously 
realized as the Cb cannot be interpreted as the cospecifier of a zero. 

Discourse coherence can be maintained by an inference process based on the lexical seman- 
tics, but the preferred interpretation is not always computed by a inference process purely 
driven by the underlying semantics. Instead, discourse information such as attentional focus 
and salience provides constraints on the application of information from lexical semantics. 
This interaction is key for enhancing centering by incorporating disambiguation information 
from other sources. 

This claim is further supported by the observation in (14). If we assume that the antecedent 
of a zero can be any of the entities that were previously realized in a discourse, nothing stops 
the zero in the third utterance from taking doosya ('the company') in the first utterance as 
its antecedent since this would yield a semantically plausible ROUGH-SHIFT interpretation. 
However, this interpretation is never preferred over the interpretation obtained by a more 
highly ranked centering transition. That is, no interpretation based on lexical semantics is 
preferred to an interpretation that is ranked higher in terms of centering transitions. The 
preferred interpretation according to the centering rules cannot be overridden unless this 
interpretation is semantically anomalous. 

(14) a. doosya wa 15-dai no hanbai o mikondciru. 

company top/subj 15-piece GEN sales OBJ anticipate 
The company anticipates the sales of 15 machines. 
Cb: COMPANY 
Cf: [company, sales] 
b. cvD-sooti wa ceraus. 
CVD-device top/subj ceraus 
The CVD device is (called) CERAUS. 



Cb: 


COMPANY 




Cf: 


[CVD-DEVICE, CERAUS] 


RETAIN 



c. maruti-tyenbaa-hoosiki o saiyoo. 
SUBJ multi-chamber system OBJ adopt 
(CVD-device) adopts a multi-chamber system. 



Cbl: 


CVD-DEVICE 


Cfl: 


[CVD-DEVICE, SYSTEM] SMOOTH-SHIFT 




SUBJ, OBJ 


Cb2: 


CVD-DEVICE 


Cf2: 


[COMPANY, SYSTEM] ROUGH-SHIFT 




SUBJ, OBJ 



d. tahaisen-maku ni taioo-dekiru. 
SUBJ multi-wired film OBJ2 deal-can 
(CVD-device) can deal with multi-wired films. 
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Cbl: 


CVD-DEVICE 


Cfl: 


[CVD-DEVICE, FILMS] CONTINUE 




SUBJ, OBj2 


Cb2: 


COMPANY 


Cf2: 


[company, films] SMOOTH-SHIFT 




SUBJ, OBJ2 



The lexical semantics of the verb saiyoo ('adopt') in (14)c would not block the company in 
(14a) being realized as their subject. For instance, both 'The CVD-device adopts a multi- 
chamber system' and 'the company adopts a multi-chamber system' are reasonable readings 
of (14)c. However, the Cf2 reading, which is obtained on the basis of lexical semantics and 
yields the rough-shift transition, is not preferred to the Cfl smooth-shift reading. The 
preference assigned to (14c) based on centering transitions is seen in (14)d. 

The verb taioo in (14d) means 'answer' or 'response' when the human being or the organiza- 
tion is the subject, and it normally takes an abstract noun such as as 'demand', 'a political 
crisis' as its object. The verb also takes the non-agentive entity as the subject and means 
its applicability to some other object expressed in the non-subject position. The missing 
subject of the sentence in (14)d, which has a concrete object in the object2 position, there- 
fore naturally refers to the CVD device rather than the company, meaning that the CVD 
device is applicable to handle multi-wired films. The preferred interpretation of (14)d thus 
supports the preference computed in utterance (14)c based on the centering transitions; the 
interpretation, which preserves discourse coherence between discourse segments, is the one 
most preferred. 

Thus, lexical semantics can be used to resolve the interpretation of zeros, as long as its 
interaction with discourse information about attentional state is taken into consideration. 



3.2 Interaction with tense and aspect 



It is not always the case that lexical semantics provides a cue. Observe the following exam- 
ples. 

(15) a. T. Electron wa hiitaa-koozyoo no kensetu ni tyakusyusita. 

T Electron top/subj heater factory gen construction OBJ began(PAST) 
T. Electron began the construction of its heater factory. 

b. koremade kyoodaigaisya kara kyookyuu o uketeita ga, 
by now SUBJ brother-company from supply OBJ recieved but 

By now (T. Electron) has been receiving the supply from its brother company, 

c. zisya-seisan ni kirikactciku. 
SUBJ self-production OBJ2 introduce 

(T. Electron) will introduce self-production. 

d. Hiitaa-koozyoo wa itagane-koozyoo ni rinsetusite kensetusuru. 
Heater factory top/obj SUBJ steel factory to adacent construct. 
(T. Electron) is constructing the heater factory next to the steel factory. 

e. hiraya-date de, yukamenscki 658 heihoo-mcctoru. 
SUBJ one-story is floor space 658 square meter 

(The heater factory) is one- story building with the floor space of 685 square meter. 

f. CVD-sooti-yoo hiitaa o seisansuru. 
SUBJ CVD-device-for heater OBJ produce 

(The heater factory) will produce heaters for CVD-devices. 
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g. Toosigaku wa 2-oku 8-sen man yen da. 
investment- money TOP / SUB J 280 million yen is 
The investment money amounts to 280 million yen. 

Cb: HEATER FACTORY 

Cf: [investment, 280 million yen] retain 

SUBJ, COMP 

h. san'nin no gizyutusya o Sagami ni gizyutusyuutoku tame hakensita. 
SUBJ three GEN technician OBJ Sagami to technical training for sent 

(T. Electron) sent three technicians to Sagami for technical training. 



Cbl: 


INVESTMENT MONEY 


Cfl: 


[heater factory, technician] rough-shift 




SUBJ, OBJ 


Cb2: 


investment money 


Cf2: 


[T. Electron, technician] rough-shift 




SUBJ, OBJ 



No entity in (15)g is suitable as an antecedent of the zero in (15)h - the investment money 
is never interpreted as the sender in (15).[| That is, the ROUGH-SHIFT transition is forced 
to make sense out of (15)h and the zero looks for its potential antecedent in the previous 
utterances. There are two entities whose semantics is compatible with what the verb of 
the sentence requires as its argument. That is, it is both plausible to say that ' The heater 
factory (as an organization, though the construction of its building has not been completed) 
sent technicians' as well as L T. Electron sent technicians'. However, the second reading is 
more preferred. I assume that the shift is supported by the use of the past tense:^] the 
attentional focus in (15)h returns to an event which has been completed at the time of the 
utterance. Note that T Electron has been mentioned as an entity which conducted some 
past action at the beginning of the text. 

The example illustrates how inference based on temporal/aspectual information can be 
used to resolve ambiguity when no local constraints are available. They are used to control 

Here heater factory and T. Electron, realized in the previous discourse segments, 
are potential antecedents of the zero because they both meet the constraints on 
the antecedency of zeros and the semantics of the verb. However, the following 
alternative analysis would be possible. The introduction of a new entity, tossigaku 
('investment money') in (15)g may indicate that this entity is associated with an 
entity that has been already introduced in the discourse. That is, we can assume 
that there is functional dependency relation between heater factory and investment 
money; investment money is the money for establishing the heater factory. In other 
words, the heater factory might be implicitly realized in (15)h though it is not overtly 
expresed. More research should be done to formalize when such an implicit relation 
is realized. A statistical measure of cooccurrence of nps may be useful to identify 
potential attributes associated with an entity. For instance, a company may have 
attributes name, location, owned-by, product, net- worth, nationality and 

THE NUMBER OF EMPLOYEES and SO on. 

8 Tense in Japanese is realized as the morpheme attached to the verb stem. In 
general, for the [— stative] verbs, the simple present (or non-past) tense is marked 
with -u, while the simple past (or perfect) tense with -ta. The present tense form 
of [—stative] verbs usually refers to future time unless they represent habitual or 
generic actions, in which case they refer to present time (Kuno 1973). The past form 
represents an action that has been completed or executed at reference time. 
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the flow of information, indicating the shift of the reference point in describing events. In 
other words, temporal/aspectual coherence participates in an inference system to maintain 
non-local coherence and it provides a cue to identify discourse structure segments and their 
non-local hierarchical relations in discourse. 



3.3 Interaction with agreement 



The third strategy to maintain discourse coherence is one that uses different types of agree- 
ment information in order to elicit adequate inference and eliminate an undesired potential 
interpretation. Consider example (16). 

(16) a. S. Metal wa zisedaigata ettingu-sooti o kaihatu, 

S. Metal top/subj next-generation-type etching-device OBJ develop 
S. Metal has developed next-generation type etching devices, 

b. kotosi kara honkakuteki-na maaketingu o hazimcteiru. 
SUBJ this year from full-scale marketing OBJ begin 

(S. Metal) has started full-scale marketing this year. 

c. (a few sentences about the etching device) 

d. CVD-sooti wa korc ni tuzuku mono de, 
CVD-device top/sub this OBJ2 follow thing be 

CVD devices are the thing that will follow this (i.e. etching devices). 

e. habahiroi zyuyoo ga kitaisareteiru. 
wide demand SUBJ is-expected 
Wide range of demand is expected. 

f. tomoni ECR o riyoositeori, 
SUBJ both ECR OBJ use 

(CVD devices and etching devices) both use ECR, 

The adverb tomoni ('both') in (16)f indicates that the unexpressed subject in the utterance 
refers to a set of two entities. Considering the previous discourse, we see that the entities 
which are of the same type and can form a set in this discourse segments are etching devices 
and CVD devices. Without this quantifier-like adverb, the zero could refer to S. Metal, 
which is an legitimate antecedent of the zero by itself. 

Although the language does not mark number distinction (i.e. singular vs. plural) on nouns, 
classifiers are used when the number or the quantity does matter; two cups of tea, 3 individ- 
uals of professors, 5 things of apples and so on. Expressions which are sensitive to number 
as in (16) thus can be used to make an adequate grouping among the entities in a discourse 
and prune an undesired interpretation which is otherwise predicted or never eliminated by 
basic discourse coherence principles. 



3.4 Summary 



In conclusion, a shift of centers occurs only when such an intended interpretation is well 
supported by other contextual information, so that the speaker's intention is rarely mis- 
interpreted. If the speaker is concerned that her utterance might be misinterpreted as a 
consequence of shifting the topic, she always has an alternative to express the intended 
new topic overtly as I originally hypothesized in (3) above. However, constraints that 
arise from lexical semantics and the event structure appear to be readily available cues 
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that the hearer can use to interpret zeros with nonlocal antecedents. In the following sec- 
tion, I will discuss how these observations can be incorporated into centering, and go 
some way towards integrating centering with a model of global discourse structure (cf. 
[Hobbs, 1985, Polanyi and Scha, 1984, Reichman, 1985, Grosz and Sidncr, 1986 . 



4 Integrating Centering and Global Coherence 



Although our initial hypothesis was that zeros would not be used to shift centers, we saw 
above that this often happens in naturally occurring discourse. The relevant numbers are 
repeated with the definition of the various centering transitions in figure |3|. |^] 





Cb(Ui) = Cb(Ui-i) 


Cb(Ui) ^ Cb(Ui-i) 


Cb(Ui) = Cp(Ui) 


CONTINUE 76 


SMOOTH-SHIFT 34 


Cb(Ui) + Cp(Ui) 


RETAIN 3 


ROUGH-SHIFT 23 



Figure 3: Distribution of Centering Transitions with Zeros 



In the current algorithms of Centering Theory [Brcnnan et al., 1987, Walker et al, 1994], 
interpretations are determined by the Cb and Cf in Ui_i and Ui (i.e. local discourse entities). 
However, the observations above suggest that the theory must support an algorithm for 
accessing non-local antecedents when a rou gh-shift transition occurs and a shift to a 
non-local center is detected (cf. |Sidner, 1983(1 ). 



In order to capture global coherence, another center data structure must be added to keep 
track of the Cbs introduced in the previous utterances. My data shows that zeros in ROUGH- 
SHIFTS realize discourse entities that were previously realized as the Cb(Ui_ n ): there are no 
cases where a zero realizes a discourse entity that was previously a non-Cb. Thus I propose 
that what is needed is a Cb retrieval mechanism of some type to model the cases where a 
zero is resolved to a discourse entity that was an earlier center. 



This Cb retrieval mechanism could be based on the stack mechanism of [3idner, 1983 



Grosz and Sidner, 1986 1 , or the cache mechanism proposed in [Walker, 1996| and discussed 
in (Walker, this volume). Since I have no evidence that anything more powerful than a 
list is required, the proposed algorithm is to search a linearly ordered list of former Cbs, 
ordered by recency. In all the cases in my data, it is sufficient to search back through a list 
of former Cbs ordered by recency and choose as the antecedent of the zero the first such 
Cb that is semantically compatible with the requirements of the zero. This mechanism for 
computing global coherence must interact with the centering algorithm for local coherence 
in such a way that the former is activated when the latter fails. The condition may be stated 
as follows. 



(17) IfCb(Ui) jt Cp(Ui), then take Cb(U m ) which is an element of M (i.e. Cb(U m ) 6 M) 
where M is a list of Cb(Ui...(i_i)) which satisfies the inference process. 

When local coherence is not observed and the shift of the center is forced, the list of the 
Cbs of the previous discourse, M, is searched, and each proposed Cb is checked against an 



9 Note that Figure ||] table shows the frequency of each transition when an utter- 
ance contains at least one zero. 
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inference process based on lexical semantics and tense and aspect information, to determine 
its adequacy. The algorithms to refer to the global discourse may be sketched as in (18). 

(18) When a Cb shift is detected (i.e. Cp(U;) ^ Cb(Ui)): 

1. Local Coherence Check: 

if retain and no zta-CONTINUE is available, go to Global Coherence Check 
if rough-shift, go to Global Coherence Check 
else return to Centering algorithm 

2. Global Coherence Check: 

Take a Cb(U m ) on the Cb list, and (e.g. Cp(U m ) G M) 
Employ inference systems 

3. Decision: 

if the interpretation Cp(Ui) = Cb(U m ) is acceptable, return to Centering al- 
gorithm 

else return to Global Coherence Check and try the next Cb on the Cb list 



5 Discussion 



In this chapter, I discuss issues that centering theory needs to address in order to model 
discourse coherence in a larger context. I argue that the use of zeros to realize previous 
Cbs in retain and rough-shift centering transition states indicates that coherence in- 
formation provides constraints on inferential processes. Future work must integrate these 
observations with other studies on shifting centers. The data examined here show that lex- 
ical semantics as well as temporal/aspectual information are used to create links between 
non-local utterances, and that Centering theory can be extended to compute non-local dis- 
course coherence as long as it incorporates a richer semantic representation of utterances. I 
propose that the combination of the centering algorithm with a global Cb list captures some 
aspects of global coherence, without introducing a completely different module. This kind 
of mechanism suggests that it might be possible to use Centering as a part of an algorithm 
for inferring discourse structure. 
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